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ABSTRACT 

Wavelets are a new and powerful mathematical tool, whose most celebrated appli- 
cations are data compression and de-noising. In Paper I (Romeo, Horellou & Bergh 
2003), we have shown that wavelets can be used for removing noise efficiently from 
cosmological, galaxy and plasma iV-body simulations. The expected two-orders-of- 
magnitude higher performance means, in terms of the well-known Moore's law, an 
advance of more than one decade in the future. In this paper, we describe a wavelet 
add-on code designed for such an application. Our code can be included in common 
grid-based iV-body codes, is written in Fortran, is portable and available on request 
from the first author. The code can also be applied for removing noise from standard 
data, such as signals and images. 

Key words: plasmas - methods: A^-body simulations - methods: numerical - galax- 
ies: general - galaxies: kinematics and dynamics - cosmology: miscellaneous. 



1 INTRODUCTION 

In A''-body simulations the number of particles A*' cannot 
generally be set equal to the number of bodies of the real 
system, but is dictated by the available computer power. 
A simulation with, say, ten times more particles demands 
at least one order of magnitude more computational time, 
memory and storage. Because of such a limitation, A'^ is gen- 
erally several orders of magnitude smaller than required. A 
small A'" means that the statistical fluctuations of particle 
positions and velocities are artificially enhanced, and so are 
coUisional effects. This is dangerous because collisions affect 
the ability of resonances to damp or amplify perturbations, 
which in turn affects the formation of structures and the 
dynamical evolution of the model. Thus an effective method 
of noise reduction is required. 

The standard way to reduce noise in A'^-body simula- 
tions is to soften the interparticle force at short distances, 
either directly or using flnite-sized particles (e.g., Romeo 
1994, 1997, 1998a, b; Dehnen 2001; see also Byrd 1995). On 
the other hand, softening reduces noise only in part. The ini- 
tial conditions imposed on particle positions and velocities 
are also relevant. Basically, noise can be suppressed at the 
beginning of the simulation by sampling phase space reg- 
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ularly (quiet starts), rather than randomly (noisy starts), 
consistent with the distribution function of the model (e.g., 
Dawson 1983; Birdsall & Langdon 1991; Knebe, Green & 
Binney 2001). Even so, noise will develop during the sim- 
ulation. In fact, quiet starts impose an initial order on the 
model. But the model will react to such a state of low en- 
tropy and follow the natural tendency of physical systems 
towards thermalization. The development of noise is medi- 
ated by instabilities, which amplify and randomize the ini- 
tial correlations arising from the discrete regular sampling 
of phase space. 

The effects of noise are subtle and not yet fully under- 
stood. Today, half a century after the first A'-body simula- 
tions, there is still an intense debate. Noise is a crucial issue 
for simulations of structure formation in the early Universe 
(e.g.. Splinter et al. 1998; Hamana, Yoshida & Suto 2002; 
Power et al. 2003; Binney 2004; Diemand et al. 2004; Sylos 
Labini, Baertschiger & Joyce 2004), and for galaxy simula- 
tions (e.g., Pfenniger 1993; Pfenniger & Friedli 1993; Wein- 
berg & Katz 2002; O'Neill & Dubinski 2003; Valenzuela & 
Klypin 2003). Noise is an important issue not only for cos- 
mology and astrophysics but also for plasma and accelera- 
tor physics, where simulations are used for technological ap- 
plications such as fusion and charged particle beams (e.g., 
Dawson 1983; BirdsaU & Langdon 1991; Arter 1995; Kan- 
drup 2003). The noise problem is acute and awaits solution. 
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Wavelets are the state-of-the-art technique used for 
noise reduction in digital signal/image processing (see, e.g., 
Mallat 1998; Bergh, Ekstedt & Lindberg 1999; for tradi- 
tional techniques such as data averaging or Wiener filter- 
ing see, e.g., Gonzalez & Woods 2002). Wavelets have an 
intrinsic ability to compress the signal into few large coeffi- 
cients, so that noise can be removed with a proper thresh- 
olding. Being intrinsic, their ability is independent of general 
properties of the data such as the number of dimensions or 
the presence of symmetries. Wavelet de-noising is very ef- 
fective: it outperforms traditional techniques of noise reduc- 
tion and the algorithm is even faster than the fast Fourier 
transform. The second aspect is especially important for our 
application since speed is a primary factor in A^-body simu- 
lations. We recommend the following literature: for the con- 
tiimous and fast wavelet transforms see Addison (2002) and 
Goedecker (1998), respectively; for physical applications see 
again Goedecker (1998) and the beautiful book by van den 
Berg (2004). 

In Paper I, we have pioneered the first application of 
wavelet de-noising to A'^-body simulations. Our method has 
been subjected to several hard tests. The conclusion is that 
it can make the simulation equivalent to a simulation with 
two orders of magnitude more particles. The implications 
are clear. 

In the present paper, we show that our method even 
allows controlling the effectiveness of de-noising: the simu- 
lation can be made equivalent to a simulation with F times 
more particles, where F is assigned by the simulator before- 
hand. Such a degree of freedom can be exploited for un- 
derstanding the effects of noise more thoroughly. Besides, 
we describe the code that implements our method. It is an 
add-on code, and as such is meant to be included in the sim- 
ulator's A^-body code. This is simple if the A''-body code is 
of particle-mesh type. Our code can also be used by itself for 
de-noising standard data, such as signals and images. It is 
written in Fortran, is portable and available on request from 
the first author. Last but not least, we have the ambition 
to provide a reader-friendly and self-contained discussion of 
wavelet de-noising, from the basics to the most advanced 
aspects of our problem. For further reading see Paper I and 
the literature already recommended. 

The rest of our paper is organized as follows. Wavelets 
and wavelet applications are overviewed in Sects 2 and 3, 
respectively. Sect. 3.2 not only discusses the basics of de- 
noising, but also explains the steps of the de-noising algo- 
rithm in our code. These are then discussed in detail in 
Sect. 4. After such a tour of the code, we explore advanced 
de- noising in Sect. 5. There we learn how to control the ef- 
fectiveness of de-noising, and how this part of the method is 
implemented in the code; besides, we discuss further aspects 
of the problem. Practical points concerning the use of the 
code are discussed in Sect. 6. Finally, the conclusions are 
drawn in Sect. 7. 



2 BASICS OF WAVELETS 

2.1 The fundamental property of wavelets 

Data such as signals, images and those arising from the nu- 
merical solution to physical problems generally enclose infor- 



mation on various scales. In order to extract such informa- 
tion, we should be able to separate small-scale features from 
large-scale features and to understand their contributions 
to the overall structure of the data. The classical technique 
used for this purpose is the Fourier transform, which encodes 
the original time/space information into frequency content 
of the data, the frequency being roughly the inverse of the 
relevant scale. But the Fourier transform runs into a serious 
difficulty: it loses all information about the time/space lo- 
calization of a given frequency component. This is nothing 
but the consequence of the Heisenberg uncertainty principle 
in the context of data processing. 

The traditional way to overcome this difficulty is to lo- 
calize the complex sinusoid of the transform multiplying it 
by a window function, a Gaussian for example, which is then 
translated across the data. For a window of given shape, its 
width determines not only the time/space resolution but 
also the frequency resolution, again as a consequence of the 
Heisenberg uncertainty principle. A narrow window gives a 
good time/space resolution but a bad frequency resolution, 
and vice versa for a wide window. So how should we choose 
the width of the window? If we choose it comparable to 
the smallest scale of interest, the time/space resolution of 
course matches the data, but the frequency localization is 
too poor to resolve the low frequencies characterizing large- 
scale features. And if we choose a wider window so as to 
have a finer frequency resolution, the time/space resolution 
gets too coarse to analyse small-scale features. Thus even the 
windowed Fourier transform runs into a difficulty: it has a 
fixed time/space- frequency resolution [constant bandwidth, 
in the language of data processing; cf. Paper I, fig. 1 (left- 
hand panel)]. 

Wavelets are a multi-scale method that overcomes this 
difficulty. Their fundamental property is to provide an adap- 
tive time/space- frequency resolution, in the sense that the 
uncertainty in frequency is proportional to the frequency it- 
self [constant relative bandwidth, in the language of data 
processing; cf. Paper I, fig. 1 (right-hand panel)]. In other 
words, this means that small-scale features of the data are 
analysed with fine resolution in time/space and coarse res- 
olution in frequency, as is natural, and vice versa for large- 
scale features. 

2.2 Wavelet transform 

In order to provide an adaptive time/space- frequency reso- 
lution, the wavelet analysis involves localized wave- like func- 
tions, which are contracted or dilated over the relevant range 
of scales and translated across the data. On the other hand, 
there are several ways to carry out the analysis, depend- 
ing on whether the data are continuous or discrete; and, in 
the discrete case, depending on technical factors. Here we 
present the wavelet transform that is most appropriate for 
our application, which is also the one most commonly used 
for compressing and de-noising discrete data. 

The contributions of small-scale and large-scale features 
are singled out with an iterative procedure. The first step 
consists of separating the smallest-scale features from the 
others. It is done by passing the data through a high-pass 
filter and a complementary low-pass filter. These filters are 
the discrete counterparts of the analysing functions of the 
transform, the wavelet •il){x) and the scaling function </>(a;) 
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Figure 1. Various wavelets ■>p{x) and scaling functions (f>{x). The wavelets 'haar' and 'daub4' are orthogonal. The wavelet pairs 'bior 4.4' 
and 'rbio 6.8' are bi-orthogonal and also quasi-orthogonal, hence for each of them the decomposition ('dec') and reconstruction ('rec') 
wavelets are distinct but similar. Analogous considerations apply to the scaling functions. 



respectively, and are constructed with a mathematical tech- 
nique known as multi-resolution analysis. Filtering produces 
redundant information, since each set of filtered data has the 
same size as the original data. Redundancy is avoided by re- 
jecting every other point of the filtered data. It is well known 
that down-sampling produces aliasing in the context of the 
Fourier transform, but the filters of the wavelet transform 
are constructed in such a way as to eliminate it. The sec- 
ond step consists of separating the features that appear on 
a scale twice as large as in the first step. It is done by re- 
garding the low-pass filtered and down-sampled data as new 
input data, and by analysing them as in the first step. The 
procedure continues until the largest-scale features are also 
separated. In summary, the wavelet transform decomposes 
the original data into a coarse approximation and a sequence 
of finer and finer details, keeping the total size of the data 
constant (cf. Paper I, fig. 2). We can draw an analogy with 
art and say that the approximation gives an 'Impressionist' 
view of the data! 

The original data can be reconstructed with the inverse 
wavelet transform. The coarsest approximation and detail 
are up-sampled, filtered and added. Here up-sampling means 
inserting zeros between the data points, and the filters are 
closely related to the decomposition filters so as to eliminate 
aliasing. The output is a finer approximation, which is then 
combined with the corresponding finer detail as above, and 
the procedure is iterated. In practice, the wavelet synthesis 
is carried out for reconstructing data that have been pro- 
cessed in wavelet domain, such as in data compression and 
de-noising. 

The great success of the wavelet transform arises not 
only from its adaptive time/space- frequency resolution, but 
also from its speed: it is even faster than the fast Fourier 
transform. Given data of size A^d = 2"' (J positive integer) 
and filters of effective size 2AI, the fast wavelet transform 
is computed with iMNd arithmetic operations, where M is 
a positive integer independent of A^d and typically smaller 
than ten. In contrast, the fast Fourier transform has com- 



plexity 2Nd logj A^d . Note that such an efficiency follows di- 
rectly from the non-redundancy of the transform. Similar 
considerations apply to the inverse fast wavelet transform. 



2.3 Wavelet properties 

In contrast to classical transforms, where the analysing func- 
tions belong to a single class and are defined analytically, 
there are dozens of wavelet families and their members are 
generally defined numerically through the associated filters. 
Why do we have so many choices? Because, even though 
the fast wavelet transform has an adaptive time/space- 
frequency resolution, there are various ways to optimize the 
trade-off between time/space and frequency localizations, 
and different conditions can be imposed. In other words, 
wavelets are not all equivalent in applications and, if we 
want to choose the optimal wavelet for a given problem, we 
must understand their properties well. The properties of the 
scaling functions are determined by those of the wavelets, 
but are themselves less relevant. The wavelet properties are: 
size of support, symmetry, number of vanishing moments, 
regularity and (bi-)orthogonality. 

Size of support. The support of a wavelet is the interval 
where the wavelet is non-zero. Its size determines not only 
the time/space localization of the wavelet, but also the speed 
of the transform. 

Symmetry. Symmetry also influences the quality of time/- 
space localization. For example, an asymmetric wavelet can 
be regarded as giving a location with asymmetric error bars. 

Number of vanishing moments. A wavelet ip{x) has n van- 
ishing moments when 



x''il;{x)dx = for v — 0,1, 



(1) 



where x denotes time or space. In particular, all 'normal' 
wavelets have zero mean (n = 1) since, under rather general 
assumptions, this is related to the admissibility condition for 
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the existence of the inverse transform. The number of van- 
ishing moments affects the frequency localization. In fact, 
the Fourier transform of a wavelet with n vanishing mo- 
ments peaks at a characteristic frequency and decays as k" 
towards the origin, where k denotes frequency. 

Regularity. Regularity also affects the frequency localiza- 
tion. In fact, the Fourier transform of a wavelet that is con- 
tinuous together with its first n — 1 derivatives decays as 
^-(n-i-i) ^Q^ards infinity. 

(Bi-) Orthogonality. The orthogonality property concerns 
the set of wavelets defining the transform, that is the set of 
scaled and translated versions of the basic wavelet. It means 
that such wavelets form an orthogonal basis. The alterna- 
tive bi-orthogonality property means that the decomposition 
and reconstruction wavelets form two distinct bases, which 
are mutually orthogonal. Note that (bi-)orthogonality is in- 
timately related to the non-redundancy of the transform. 

It follows that a good time/space localization requires 
a small support and high symmetry, and a good frequency 
localization requires many vanishing moments and high reg- 
ularity. A small support is also needed for a faster transform. 
On the other hand, the wavelet properties are interrelated. A 
small support implies relatively few vanishing moments and 
low regularity. In addition, orthogonality implies asymme- 
try, except for the simplest wavelet. Bi-orthogonality weak- 
ens the coupling between the properties of the decomposi- 
tion and reconstruction wavelets, and allows perfect sym- 
metry. This means that the requirements above cannot be 
satisfied equally well. In order to choose a good wavelet, we 
should then know their relative importance, which depends 
on the application. 



2.4 What do wavelets look like? 

Let us now illustrate what wavelets and scaling functions 
look like. (Recall that the scaling functions are the contin- 
uous counterparts of the low-pass filters of the fast wavelet 
transform and its inverse; see Sect. 2.2, and also Sect. 2.3.) 
Fig. 1 shows various representatives. The wavelets 'haar' and 
'daub 4' belong to the family of Daubechies wavelets, and are 
simple (see Daubechies 1992; here we use the same names 
as in the code). These wavelets are orthogonal; and they 
have few vanishing moments, small support, low regularity 



and no symmetry ('haar' is an exception). The wavelet pairs 
'bior4.4' and 'rbio6.8' belong to the family of bi-orthogonal 
spline wavelets and to its reverse, respectively, and are more 
advanced (see Daubechies 1992; here we use the same names 
as in the code). Such wavelet pairs are not only bi-orthogonal 
but also quasi-orthogonal: for each pair the decomposition 
and reconstruction wavelets are distinct but similar. In ad- 
dition, they have more vanishing moments, larger support, 
higher regularity and perfect symmetry. 



3 BASICS OF WAVELET APPLICATIONS 
3.1 Data compression 

The adaptive time/space-frequency resolution and the non- 
redundancy of the fast wavelet transform have an important 
implication: given regular data, most information present in 
them gets concentrated into few large wavelet coefficients. In 
practice, this means that we can set all the other coefficients 
to zero and get back data almost identical to the original 
ones. This is the idea behind data compression. 

The compression ability can be quantified by the com- 
pression factor CF and the loss of information LI, defined 
as: 



CF - 



LI\ 



Nw 

TLyj 

= 100 I 1 - 



(2) 



(3) 



where Nw is the number of wavelet coefficients Wi (and 
is equal to the number of data points A^d), and n„ is the 
number of wavelet coefficients Wi that are not set to zero. 
Clearly, there are also visual criteria for judging the quality 
of the compressed data. 

The most important requirement for a good compres- 
sion ability (large CF and small LI) is that the decompo- 
sition wavelet should have many vanishing moments, and 
the basic reason is the following. A wavelet with n vanish- 
ing moments is insensitive to polynomials of degree n — 1. 
Regular data behave approximately as such polynomials in a 
neighbourhood of a given point. Hence the wavelet only feels 
the deviation from such behaviour, which decreases with n. 
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Figure 3. De-noising in action, a: data with Poissonian noise 
and the perfectly non-noisy model data, b: pre-processed data, c: 
wavelet coefficients computed by fast wavelet transforming, after 
the choice of the wavelet; note that there are few large wavelet 
coefficients and many small wavelet coefficients, d: original and 
thresholded wavelet coefficients; the threshold is also shown, e: 
data computed by inverse fast wavelet transforming, f : de-noised 
data computed by post-processing vs. the model data. The signal- 
to-noise ratio is SNR ~ 8.8 in the noisy data and SNR ~ 73.9 in 
the de-noised data, hence the de-noising factor is DF ~ 8.4. (For 
more information see the text.) 



Thus a large n means that the detail coefficients tend to be 
small and this implies a potentially good compression abil- 
ity. On the other hand, a good compression ability is guar- 
anteed only if the decomposition wavelet has a sufficiently 
small support. Basically, the analysis should be local enough 
otherwise the deviation from polynomial behaviour of the 
data becomes significant. (For a more explicit condition see 
Sect. 4.2.) Lastly, high regularity and symmetry are mainly 
needed by the reconstruction wavelet for a good quality of 
the compressed data. The conclusion is that bi-orthogonal 
wavelets represent the best alternative for satisfying the re- 
quirements above. 

Let us then consider a beautiful image of disc galaxies 
and choose an appropriate wavelet pair, 'rbio2.8', which be- 
longs to the family of reverse bi-orthogonal spline wavelets 
(see Daubechies 1992; here we use the same name as in the 
code). Fig. 2 illustrates the example eloquently. 



3.2 De- noising: data and simulations 

The compression ability of the fast wavelet transform has a 
further important implication: given noisy data, the under- 
lying regular part gets mostly concentrated into few large 
wavelet coefficients, whereas noise is mostly mapped into 
many small wavelet coefficients. In practice, this means that, 
if we identify a correct threshold, then we can set all the 
small coefficients to zero and get back data almost decon- 
taminated from noise. This is the idea behind data de- 
noising: a rigorous way to compress noisy data. In this sec- 
tion, we go on discussing the basics of de-noising. A more 
detailed discussion is given in Sect. 4. The identification of 
a correct threshold, which is crucial to the whole process of 
de-noising, is discussed in Sect. 4.4. 

Let us now illustrate the basics of de-noising in a con- 
crete case (cf. Fig. 3). Fig. 3a shows data with Poissonian 
noise and the perfectly non-noisy model data.^ This type of 
noise is characterized by a multivariate Poissonian proba- 
bility distribution, and hence the local standard deviation 
of the data is equal to the square root of their local mean: 
c"ioc = y^Mioc (see, e.g., Bevington & Robinson 1992). Poisso- 
nian noise occurs in all experiments and observations where 
the data represent 'counts' in a set of bins. Fig. 3b shows that 
the noisy data are pre-processed so as to transform Poisso- 
nian noise into Gaussian white noise, where 'white' means 
that it is equally significant on all scales (constant power 
spectrum). This type of noise is well known for its mathe- 
matical tractability (see, e.g., Gonzalez & Woods 2002). In 
fact, the reason for pre-processing the data is that Gaussian 
white noise is convenient for identifying a correct threshold. 
Figs 3c-f show the remaining route after the choice of the 
wavelet: fast wavelet transforming (Fig. 3c); thresholding 
the wavelet coefficients, which is the heart of de-noising (Fig. 
3d); inverse fast wavelet transforming (Fig. 3e); and, finally, 
post-processing the data, which is needed after the initial 

^ In addition, it turns out that the first n 'multipole' moments 
of the data are conserved, starting from the zeroth-order one, if 
no approximation coefficient is set to zero. This is particularly 
meaningful when the data represent a mass or a charge distribu- 
tion. 

^ Poissonian data can be generated using the numerical recipes 
by Press et al. (1992). 
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Figure 4. Original and thresholded wavelet coefficients (top), and de-noised data vs. the model data (bottom) for various choices of 
the wavelet. Also specified are the signal-to-noise ratio SNR in the de-noised data and the de-noising factor DF. The reference case 
illustrated in Fig. 3 corresponds to 'bior4.4'. 



pre-processing (Fig. 3f). The de-noised data are shown vs. 
the model data. 

As in Fig. 3 the model data are known, the de-noising 
ability can be quantified by the signal-to-noise ratio SNR 
and the de-noising factor DF, defined as: 



SNR : 



DF 



{SNR)i, 



1/2 



[SNR),, 



(4) 



(5) 



where Xi are the model data and Yi are either the noisy 
data ('no') or the de-noised data ('de'). In addition, {SNR)da 
means the inverse of an appropriately defined estimation er- 
ror. Clearly, there are also visual criteria for judging the 
quality of the de-noised data. Fig. 3 illustrates the improve- 
ment produced by de-noising expressively. In general, the 
model data are not known so the de-noising ability and the 
quality of the de-noised data are difficult to estimate. 

2-D or 3-D data de-noising is similar to the 1-D case. 
The differences are discussed together with other details of 
de-noising (see Sect. 4). 

How does de-noising work for N-body simulations? The 
de-noising method discussed here applies to discrete data 
so it is natural to consider grid-based A''-body simulations. 
Such simulations use a grid for tabulating the particle den- 
sity, and for computing the potential and the field (see, e.g., 
Hockney & Eastwood 1988). The number of particles n in 
each cell shows fluctuations \Sn\/{n) ~ {n)~^^'^ with respect 
to an average (n). This means that the particle distribution 
is polluted by noise that is basically Poissonian, whereas the 
noise induced in the potential and in the fleld is of a more 
complex nature. Using such a method we can thus de-noise 
the particle distribution at each timestep and make the sim- 
ulation equivalent to a simulation with many more particles. 
This is the idea behind our application (cf. Paper I). 



4 TOUR OF THE CODE 

As we have explained in Sect. 3.2, de-noising standard data 
and Ai'-body simulations consists of the following processes: 
pre-processing of the data, choice of the wavelet, fast wavelet 
transform, thresholding of the wavelet coefficients, inverse 
fast wavelet transform and post-processing of the data. 
These are also the steps of the de-noising algorithm in our 
code. In this section, we discuss them in detail. Advanced 
aspects of de-noising are discussed in Sect. 5. 



4.1 Pre-processing of the data 

The data should be pre-processed if they are contaminated 
by Poissonian noise. The Poissonian data Ip are transformed 
into data Yq with (additive) Gaussian white noise of stan- 
dard deviation ac = 1: 



Yg 



Yp + 



(6) 



(Anscombe 1948), which can then be de-noised as discussed 
in Sects 4.2-4.5. The Anscombe transformation has the re- 
markable property to help achieving normalization, variance 
stabilization and additivity (see Stuart & Ord 1991). On the 
other hand, it has a tendency to fail locally where the data 
have small values or large variations (e.g., Kolaczyk 1997; 
see also Starck, Murtagh & Bijaoui 1998). On the whole, 
such an ingenious method produces very good results if the 
data are post-processed appropriately (see Sect. 4.6). 

4.2 Choice of the wavelet 

The wavelets included in the code belong to three families: 
the Daubechies wavelets, the bi-orthogonal spline wavelets 
and the reverse bi-orthogonal spline wavelets. The last two 
families are intimately related: the decomposition wavelets 
of the 'reverse' family are the reconstruction wavelets of 
the other, and vice versa. Such wavelet families were intro- 
duced by Daubechies (1992). Various representatives of the 
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Figure 5. Original and thresholded wavelet coefficients (top), and de- noised data vs. the model data (bottom) for various values of the 
parameter A'^t min defined in the text. Also specified are the signal-to-noise ratio SNR in the de-noised data and the de-noising factor 
DF. The reference case illustrated in Fig. 3 corresponds to Aft min = 64. 



wavelets included in the code have already been shown in 
Fig. 1 and discussed in Sect. 2.4. The others have interme- 
diate properties, or are the reverse of those illustrated. The 
most useful wavelets of the code are specified at the end of 
this section. 

How does the choice of the wavelet affect de-noising? 
Let us consider the representative wavelets mentioned above, 
since the others have intermediate or similar effects. The 
reference case illustrated in Fig. 3 corresponds to 'bior4.4', 
which we have chosen to discuss the basics of de-noising 
(see Sect. 3.2). Fig. 4 shows the effects of the other choices. 
The wavelets 'haar' and 'daub 4' give rise to large irregu- 
lar coefficients in the two coarsest details, which exceed the 
threshold, and to small but significant irregularities in the 
de-noised data. The resulting signal-to-noise ratio and de- 
noising factor are worse than in the reference case. In con- 
trast, 'rbio6.8' de-noises ahnost as well as 'bior4.4'. 

Let us then explain the key points for a successful choice 
of the wavelet. The conditions that should be fulfilled for a 
good de-noising are three, (i) The wavelet should satisfy the 
requirements for a good compression, which are discussed in 
Sect. 3.1. (ii) The wavelet should be orthogonal. Orthog- 
onality implies that Gaussian white noise in the data is 
transformed into Gaussian white noise in the wavelet coeffi- 
cients. This is convenient for a correct threshold identifica- 
tion, which is discussed in Sect. 4.4. (iii) The size of the inter- 
val where the wavelet differs significantly from zero should 
be comparable to the resolution needed by the data, or to the 
effective spatial resolution of the simulations. Such interval 
must not be confused with the support of the wavelet. Con- 
ditions (i)-(iii) cannot be fulfilled equally well. The best al- 
ternative is represented by bi-orthogonal wavelet pairs that 
are also quasi-orthogonal, which are consistent with a res- 
olution of about three to four bin/mesh sizes. The selected 
wavelets are: 'bior4.4' and 'bior6.8', together with their re- 
verse 'rbio4.4' and 'rbio6.8' (cf. Fig. 1 and Sect. 2.4, and 
recall what 'reverse' means). In 'biorni.n2' the decomposi- 
tion and reconstruction wavelets have ni and n2 vanishing 



moments, respectively; in 'rbio 711.712' vice versa. We cannot 
provide further reliable guide-lines on the most appropriate 
choice of the wavelet. It depends on the problem and can be 
found through the optimization trial discussed in Sect. 6. 

4.3 Fast wavelet transform 

From the computational point of view, the choice of the 
wavelet corresponds to the choice of a set of filters for the 
fast wavelet transform and its inverse. For a bi-orthogonal 
wavelet, they are: the high-pass and low-pass decomposition 
filters Qi and hi, and the high-pass and low-pass reconstruc- 
tion filters Qi and hi, respectively. In the orthogonal case, 
Qi = Qi and hi = hi. The coefficients of hi and hi are tabu- 
lated and centred as close as possible to i = 0, while those of 
Qi and Qi are computed from the relations Qi+i = {—ly^^h-i 
and g^+l = (-l)'+^/i_i. The filters are padded with zeros 
so as to be defined for i = —M, . . . ,M (M even), and to be 
consistent with the formulae for the transforms. In particu- 
lar, 'bior4.4' and 'rbio4.4' have M = 6, while 'biorG.S' and 
'rbio6.8' have A4 — 10. (The detailed relations of such filters 
to the wavelets and scaling functions are very complicated 
and irrelevant to our context; see Goedecker 1998.) 

One step of the (forward) fast wavelet transform re- 
places the current approximation Ai{Nt), of size A^t, with 
a coarser approximation Ai{Nt/2) and detail Di{Nt/2), of 
size Nt/2: 

M 

MN,/2)= h,A,+2^{N,), (7) 

M 

A(iVt/2)= ^ ~g,A,+2^{N,), (8) 

i=-J\/4-l 

where the index j + 2i is wrapped around when it gets out 
of the range (periodic boundary conditions; see Goedecker 
1998). Initially, iVt = iVd and A^ = Xi, Xi{Nd) being the 
original data. The transform ends when A^t — A^tmin, so 
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Figure 6. Action of the 2-D fast wavelet transform and structure of the transformed data. The original data and the approximations 
are represented without patterns. The details corresponding to vertical/horizontal/diagonal variations are represented by patterns of 
horizontal/ vertical/diagonal lines (features along a certain direction have maximum gradient along the perpendicular direction). 



that the transformed data Yi{Nd) consist of the coarse ap- 
proximation yli(7Vtmin/2) and the sequence of finer and finer 
details A(iVtmin/2), A(iVtmin), Di{Nd/2). Note that 
Nt min is a free parameter of the code. If we assume that A'^d 
is a power of 2, then A'tmin is also a power of 2 and such 
that 2 < A'tmin < Nd- A complete transform corresponds to 
A^tmin ~ 2, but this value does not necessarily mean a good 
de-noising. A more general assumption is that A'd contains 
a power of two, which has obvious implications for A'tmin- 
Data of different size can be padded (see Sect. 6). 

How does the value of the parameter A'tmin affect de- 
noising? The reference case illustrated in Fig. 3 corresponds 
to A'tmin = 64 (A'd = 2048). Fig. 5 shows the effects of 
other values. For A'tmin = 2048, only the smallest-scale 
noise is removed so the processed data are nearly as noisy 
as the original data. For A'tmin = 128, there is residual noise 
on large scales, and the resulting signal-to-noise ratio and 
de-noising factor are worse than in the reference case. For 
At min ~ 2, the de-noising is complete and nearly as good 
as for A'tmin ~ 64. Nevertheless, there are large anoma- 
lous coefficients in the five coarsest details, which exceed 
the threshold, and small anomalies in the de-noised data. 

Let us then explain which values of A't min imply a good 
de-noising. It must be A'tmin Nd, otherwise de-noising is 
incomplete; and besides A'tmin >4M, otherwise the wavelet 
becomes too dilated in comparison with the size of the data 
and wrap-around effects become significant. (Cosmological 
simulations are peculiar in this context; see Sect. 5.3.) The 
best value of A'tmin depends on the problem and can be 
found through the optimization trial discussed in Sect. 6. 

Finally, we point out the (non-obvious) differences be- 
tween the 2-D or 3-D fast wavelet transform and the 1-D 
case. Fig. 6 shows how the transform acts on 2-D data. In 
general, given n-D data of size A'd , the first step of the 
transform decomposes them into 2" parts of size (A'd/2)": 
1 approximation and 2" — 1 details, one for each axis and 
each diagonal. It is done by 1-D transforming the data along 
each index, for all values of the other indices, consecutively. 
Then the discussion basically follows the 1-D case, except 
that the complexity of the transform increases by a factor of 
n2""V(2" - 1) with respect to AMN^. The generalization 
to data of size A'd! ■ ■ • A'dn is plain. 



4.4 Thresholding of the wavelet coefficients 

The heart of de-noising consists of identifying a correct 
threshold, and deciding which type of wavelet coefficients 
are to be thresholded and how. Note the difference between 
thresholding and smoothing, where the detail coefficients be- 
low a given scale are set to zero independent of their value. 
In the following, we discuss thresholding and introduce the 
options of the code. 

As pointed out in Sect. 4.2, a correct threshold can be 
identified if the wavelet is orthogonal, or quasi-orthogonal. 
The threshold T is proportional to the standard deviation 
of noise a, and the proportionality factor K depends on the 
size of the data: 



T = K{Nd)a. 



(9) 



If the standard deviation is not given (Csd = 'ng'), as in the 
case of Gaussian white noise, then it is robustly estimated 
through the median absolute deviation of the finest detail: 



1 



MAD[D,{Nd/2)] . 



(10) 



0.6745 

A robust estimator and the finest detail are used for min- 
imizing the contribution of outlying wavelet coefficients, 
which are not caused by noise. If the standard deviation 
is given (Csd ~ 'g')j ^ in the case of Poissonian noise, then 



(J ~ 1 . 



(11) 



Concerning K{Nd), it is rigorously determined so that the 
threshold matches both the noise level and the significance 
level of the wavelet coefficients, according to probability 
criteria. We can decide between two functional forms for 
K{Nd)- One corresponds to a higher threshold (Ct — 'h'), 
which is more effective but less safe: 



K{Nd) = ^2 In A'd. 



(12) 



The other corresponds to a lower threshold (Ct = '1'), and is 
approximated analytically as: 



K{Nd 







ff A^d < 32 , 



0.3936 -f 0.1829 logj A^d else . 



(13) 



Next, the wavelet coefficients to threshold can be either the 
details (Cct = 'd'): 
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Figure 7. Original and thresholded wavelet coefficients (top), and do- noised data vs. the model data (bottom) for various thresholding 
options and the 'haar' wavelet. Also specified are the signal-to-noise ratio SNR in the de-noised data and the de-noising factor DF . The 
reference case illustrated in Fig. 3 corresponds to Ct = 'h', Cct = 'd', Ctn = 'h' and the 'bior4.4' wavelet [for the corresponding 'haar' 
case cf. Fig. 4 (left)]. 



Wi = A:(iVtmin/2), . . . , A(iVd/2) ; (14) 
or the approximation & the details (Cct = 'a&d'): 

Wi = A,(iVtmin/2), A:(iVtmin/2), . . . , A(iVd/2) . (15) 

The last option concerns the thresholding (method) , named 
as in the literature. It can be either hard (Ctn = 'h'): 







ii\W4<T, 
else ; 



or soft (Ctn — 's') 



TJ7 



sign(H/0(im|-r) 



else . 



(16) 



(17) 



So Ctn ~ 's' means that even the wavelet coefficients above 
T are thresholded, and this is done shrinking them by T. 

How do the thresholding options affect de-noising? The 
reference case illustrated in Fig. 3 corresponds to Ct = 'h', 
Cct = 'd' and Ctn = 'h' (Csd = 'ng')- Fig- 7 shows the effects 
of other options. As the situation is degenerate, we consider 
the 'haar' wavelet instead of 'bior4.4', since it reduces the 
degeneracy and its effects have already been shown [cf. Fig. 
4 (left)]. For Ct = T, the threshold is exceeded by several 
noisy detail coefficients, which give rise to spikes in the de- 
noised data. For Ctn = 's', the soft thresholding of the detail 
coefficients over-regularizes the de-noised data, softening the 
maxima and minima. If in addition Cct = 'a&d', then the 
thresholding of the approximation coefficients biases the de- 
noised data. 

Let us then explain which thresholding options imply 
a good de-noising. We suggest the more effective option 
Ct = 'h' for standard data, unless they are expected to have 
meaningful irregularities below the maximum noise level; 
while we recommend the safer option Ct = T for simulations. 
In addition, it must be Cct ~ 'd' and Ctn ~ 'h', otherwise 
the de-noised data get biased and over-regularized, respec- 
tively. (In advanced de-noising, Cct and Ctn are replaced by 
a more useful parameter; see Sect. 5.1.) Finally, it is nat- 



ural to opt for Csd ~ 'ng' in the case of Gaussian white 
noise, and for Csd = 'g' in the case of Poissonian noise. If 
the Poissonian data have a sufficiently high signal-to-noise 
ratio, such that the estimated a ~ 1, then Csd ~ 'ng' fine- 
tunes the accuracy of de-noising. The algorithm gets slightly 
slower since the computation of the median has complexity 
O(A'd). Therefore Csd ~ 'ng' may be a better option than 
Csd ~ 'g' for standard data, not for simulations (where the 
signal-to- noise ratio is low and speed is a primary factor). 

Finally, thresholding in 2D or 3D is similar to the 1- 
D case and the generalization to data of size A'di • • • A'^dn 
is plain, except for the more complicated structure of the 
wavelet coefficients. 



4.5 Inverse fast wavelet transform 

One step of the inverse (backward) fast wavelet transform 
replaces the current approximation Ai{Nt/2) and coarsest 
detail Di{Nt/2) with a finer approximation Ai(Nt): 



M/2-1 
j=-M/2 



(18) 



A//2-1 

A2»+i(iVt) = Vft; 



j = -A//2 



i^,_,(Art/2) -f,g2,+iA-j(iVt/2) , (19) 



where the index i — j is wrapped around when it gets out of 
the range (see Goedecker 1998), A^t goes from A'tmin to A^d, 
and so on (see Sect. 4.3). 

The inverse fast wavelet transform can be used for plot- 
ting wavelets, as in Fig. 1. Consider the data Xi = Si„ and 
the inverse transformed data Yi. A discrete approximation 
of i/'rcc(a;) or (j}iac{x) can be computed from Yi through the 
following operations: scaling, translation, normalization and 
possibly wrap-around. The accuracy of the approximation, 
the type of function and the parameters of the operations 
depend on A^d, A'tmin and n. Including the reverse set of 
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Figure 8. Partial de-noising at a pre-assigned level. Original and thresholded wavelet coefficients (top), and de-noised data (middle) 
for various values of the contraction parameter C; also shown for comparison are realizations of Poissonian data with times more 

'counts' than in the original noisy data (bottom). The signal-to-noise ratio SNR and the de-noising factor DF are specified. The reference 
case illustrated in Fig. 3 corresponds to C = (total de- noising). The implication for simulations is explained in the text. 



filters produces ipdccix) or </idcc(a;). In Fig. 1, we have set: 
A^d = 2^*, iVtmin = 2' , n = 3A''tmin/4 for the wavelet plots 
and n — A'tmin/4 for the scaling- function plots. In both 
plots, the discrete argument is Xi = {i~b)/a and the function 
is f{xi) = \/aYi, with a = 2A'^d/A'tmin and b = Nd/2. Such a 
set-up provides an accurate approximation and avoids wrap- 
around, and thus it can be used for plotting wavelets in gen- 
eral. 



BIAS 




This means that, with very little effort, we can subtract the 
bias almost completely from the de-noised data. And, if even 
a slight global bias is unacceptable, then we can compute it 
numerically and subtract it completely from the de-noised 
data. 



4.6 Post-processing of the data 

The data should be post-processed in the case of Poissonian 
noise. Post-processing consists of inverse Anscombe trans- 
forming plus two corrections. The first correction is needed 
if the Anscombe transformation fails locally (see Sect. 4.1), 
giving rise to small negative values in the de-noised data. We 
can correct such values by setting them to zero. The second 
correction is required because the Anscombe transformation 
introduces a local bias in the data. That is, if fip is the local 
mean of Yp and is the local mean of Yq, the mean /ip es- 
timated by inverse transforming is not equal to fip, and 
their difference is the local bias of the transformation. Starck 
et al. (1998) have implied that the bias is multiplicative and 
unbounded, while Kolaczyk (1997) has implied that the bias 
is additive and bounded but has not estimated it. Indeed, 
the comprehensive book by Stuart & Ord (1991) shows that 
the bias of the Anscombe transformation is additive and 
bounded, and can be estimated analytically: 



5 EXPLORING ADVANCED DE-NOISING 

Consider a simulation where the particle distribution is de- 
noised at each timestep. If initially the de-noising factor is 
DFo, then the de- noised model has the same signal-to-noise 
ratio as a noisy model with DFq times more particles. This 
follows from the fact that the noise is basically Poissonian, 
and hence the signal-to-noise ratio is proportional to the 
square root of the average number of particles per cell. Be- 
sides, as DF depends more on the de-noising ability of the 
wavelet method than on the characteristics of the particle 
distribution, we can draw a more general conclusion: the 
de-noised simulation itself is roughly equivalent to a noisy 
simulation with DFq times more particles (cf. Paper I). Until 
now we have learned how to de-noise so as to get the largest 
DF (see Sects 3.2 and 4). On the other hand, in simulations 
we do not always want to suppress noise totally. We may 
instead want to reduce it partially in order to understand 
and control its effects. In this section, we learn how to carry 
out such an advanced de-noising. 
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Figure 9. Accuracy of partial de-noising at a pre-assigned level. 
The signal-to-noise ratio SNR is shown as a function of the 
contraction parameter C for the de-noised data and for sub- 
Poissonian data. The sub-Poissonian data have local standard 
deviation cr^^^, = C^fi^ub a^nd local mean fi^uh = MPi where 
fip refers to the original data with Poissonian noise; hence 
{SNR)sub = {SNR)p/C. The accuracy is better than 10% for 
C> I/V20, and it gets worse than 20% for C< l/VSO- The im- 
plication for simulations is explained in the text. 

5.1 Partial de- noising at a pre-assigned level 

5.1.1 Method and implementation 

Can we de-noise a simulation so as to make it equivalent to 
a simulation with F times more particles, for a pre-assigned 
level r? Yes! And the idea is the following. Recall what hard 
thresholding of the details means (see Sect. 4.4), and con- 
sider the wavelet coefficients below the threshold. If we con- 
tract them by C, instead of setting them to zero, then the 
noise level decreases by the same factor whereas the 'signal' 
does not change. Hence the signal-to-noise ratio increases by 
a factor of 1/C, and the simulation becomes equivalent to a 
simulation with times more particles. Thus the prob- 

lem is solved if we set C = 1/ \/r. (For an analogous thresh- 
olding in the context of speech signals see Storm 1998.) 

We now illustrate this idea in the simple, but instruc- 
tive, context of standard data. Fig. 8 shows that partial 
de-noising at a pre-assigned level works as expected. Note 
that this type of de-noising is meant to turn data with Pois- 
sonian noise into 'sub-Poissonian' data. In such data the 
original Poissonian deviations from the local mean are con- 
tracted by C, while obviously the de-noised data are subject 
to an estimation error. Fig. 9 shows that, as expected, the 
accuracy of partial de-noising at a pre-assigned level is very 
good except for C <2/{DF)tot, where {DF)tot refers to the 
case of total de- noising {C — 0). 

5.1.2 Bench-marks 

Let us then explore this idea in simulations of disc galaxies. 
The examination is based on four bench-marks, originally 
introduced in Paper I. 

(i) The first natural bench-mark is the comparison be- 
tween the initial models. 



(ii) The second bench-mark concerns the fragmentation 
of a cool galactic disc, which is the onset of a gravitational 
instability (see, e.g., Binney & Tremaine 1987). A rotating 
disc with low velocity dispersion is gravitationally unsta- 
ble and therefore sensitive to perturbations, which are am- 
plified and break the initial axial symmetry of the system 
(e.g., Semelin & Combes 2000; Huber & Pfenniger 2001). 
The time that characterizes symmetry breaking clearly de- 
pends on the initial amplitude of the perturbations, for small 
perturbations need a long time to grow into an observable 
level. In particular, this is true for the fluctuations imposed 
by granular initial conditions. Thus the symmetry-breaking 
time is a clear diagnostic for quantifying the effect of noise 
on the simulation. 

(iii) The third bench-mark concerns the heating following 
the fragmentation. This is a fundamental process in the dy- 
namical evolution of disc galaxies, which is induced by grav- 
itational instabilities via the outward transport of angular 
momentum and energy (see, e.g., Binney & Tremaine 1987). 
Therefore this bench-mark has a clear physical motivation. 
When spiral gravitational instabilities reach a sufficiently 
large amplitude, the velocity dispersion of the disc starts 
to increase by collective relaxation (e.g., Zhang 1998; Griv, 
Gedalin & Yuan 2002). The heat produced in a dynamical 
time is low if the initial amplitude of the instabilities is small. 
Thus the increase of velocity dispersion is another diagnostic 
for quantifying the effect of noise on the simulation. 

(iv) The fourth bench-mark concerns the accretion fol- 
lowing the fragmentation. This is also a fundamental pro- 
cess in the dynamical evolution of disc galaxies (see, e.g., 
Binney & Tremaine 1987). Therefore this bench-mark also 
has a clear physical motivation. The amplification of spiral 
gravitational instabilities produces not only heating but also 
re-distribution of matter in the disc, which appears more ev- 
idently as accretion near the centre (e.g., Zhang 1998; Griv 
et al. 2002). The mass accreted in a dynamical time is low 
if the initial amplitude of the instabilities is small. Thus the 
peak of mass density is still another diagnostic for quantify- 
ing the effect of noise on the simulation. 

We consider the same basic simulation as in Paper I, which 
has = 10^ particles. We de-noise it choosing the 'rbio6.8' 
wavelet, and setting Atmin ~ 16 and C = l/-\/T0, so as 
to make it equivalent to a simulation with 10 times more 
particles (the thresholding options are the usual ones for 
simulations; see Sect. 4.4). The conservation of angular mo- 
mentum and energy is not significantly affected. In fact, the 
deviations are less than 0.02% and 0.04% per dynamical 
time, respectively, and compare well with those typical of 
the code (Combes et al. 1990). We also run the noisy sim- 
ulation with A = lO''. This suite of simulations is sufficient 
for the present purpose. For further comparison see the ex- 
tensive survey presented in Paper I (the de-noised simulation 
has C = 0). 

Figs 10-13 illustrate that partial de-noising at a pre- 
assigned level works as expected, and the agreement is very 
good, except that the de-noised simulation takes a little 
longer time to form the initial transient structures (cf. Fig. 
11). The reason for this imperfection is twofold: it concerns 
the de-noising itself (threshold) and the initial conditions 
(noisy starts). 

Threshold. At the beginning of the simulation, there is no 
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Figure 10. Partial dc-noising at a pre-assigned level in action: comparison between the initial models. The set of simulation models has 
a physical grid of A^c = = 256^ cells and cell size Ac = 0.25 kpc. The particle distribution is shown for the noisy model with TV = 10^ 
particles (left), the noisy model with A'^ = 10® (middle), and the de- noised model with N = 10^ and contraction parameter C = 1/y/TO 
(right). In each model the signal-to-noise ratio is SNR ~ 5.7, 17.8, 17.1, respectively. As expected, the accuracy of partial de-noising at a 
pre-assigned level is very good for such initial models. 





c 




Figure 11. Partial de-noising at a pre-assigned level in action: fragmentation of a cool galactic disc, a: de-noised simulation with A'^ = 10^ 
and C = l/vTO; b: noisy simulation with = 10^; c: noisy simulation with N = 10®. The initial models are the same as in Fig. 10. 
For each simulation, the particle distribution is shown from Myr to 150 Myr at intervals of 50 Myr (from left to right). The time t at 
which the initial axial symmetry breaks is a measure of the effect of noise on the simulation: a long r means a weak effect. As expected, 
T increases from b to c; we also notice that r is a little longer in a than in c. 



way to differentiate instabilities from amplified noisy fiuc- 
tuations. Thresholding weakens the initial instabilities until 
the relevant wavelet coefficients exceed the threshold. The 
usual threshold tends to be slightly too high, and therefore 
the onset of the initial instabilities is delayed. 

Noisy starts. We know that the initial particle positions 
and velocities are noisy. We also know that the particle 
density is partially de-noised, and the computed field has 
a consistent noise level. This means that the excess of noise 
remains confined in phase space and does not propagate dy- 
namically. In fact, in the de-noised simulation the amplitude 



of the statistical fluctuations is similar to the noisier case, 
whereas the evolution of the relevant quantities is similar 
to the less noisy case (cf. Figs 12 and 13). On the other 
hand, the excess of noise makes instabilities less coherent, 
and therefore it delays their onset. 

Thus it is hard to evaluate the accuracy of partial de- 
noising at a pre-assigned level for simulations, even if the 
accuracy is better than a few per cent for the initial mod- 
els (cf. Fig. 10). But the conclusion is strong anyway: our 
method, and code, can be used for understanding and con- 
trolling the effects of noise on simulations. 
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Figure 12. Partial de-noising at a pre-assigned level in action: heating following tlie fragmentation of a cool galactic disc. The simulations 
are the same as in Fig. 11. The velocity dispersion tr is shown as a function of radius R at the initial and final times, and as a function of 
time t at an intermediate radius. The increase of velocity dispersion Aa(R) from the initial to the final value is a measure of the effect 
of noise on the simulation: a small Act means a weak effect. In the simulations, except b, heating is significant only for ij< 12 kpc. As 
expected, Act^ > ■^""c and Aca ~ Acc. 



t = 250 Myr 

T I , I , I 1 I r 




R [kpc] 

Figure 13. Partial de-noising at a pre-assigned level in action: 
accretion following the fragmentation of a cool galactic disc. The 
simulations are the same as in Fig. 11. The mass density S is 
shown as a function of radius R at the initial and final times. The 
peak of final mass density S near the centre is a measure of the 
effect of noise on the simulation: a low S means a weak effect. As 
expected, S]-, > Ec and Sa ~ Sc. 



5.1.3 Controlling the effectiveness of de-noising 

Before exploring more advanced aspects of de-noising, let us 
reflect on the main differences between partial de-noising at 
a pre-assigned level and total de-noising, and explain what 
we mean by noise control (or analogous terms). Consider a 
simulation with, say, A'^ — 10^ particles. And suppose that 
we decide beforehand to make it equivalent to a simulation 
with, say, A'^prc = 1.8 x 10^ particles (pre-assigned number). 
Then we set the contraction parameter C — l/vTS, and run 
the simulation. The partially de-noised model will accurately 
mimic a noisy model with F = 18 times more particles. This 
is noise control. On the other hand, the accuracy of partial 
de-noising at a pre-assigned level deteriorates for small C 
(cf. Fig. 9 and its discussion). In the limit C ~* the ef- 
fectiveness of de-noising becomes maximum, but we cannot 
predict it with sufficient accuracy unless we compare the 
initial models quantitatively (see Paper I). If the method 
were perfect we would have an improvement in the equiva- 
lent number of particles by a factor of 1/C^ — > oo, whereas 
in practice the improvement is by a factor of about 100 (cf. 
Paper I). In such a case we do not have control over noise, 
but we exploit total (maximum) de-noising. 

5.2 Partially noisy starts and adaptive de- noising 

Can we achieve an even better noise control? Yes, in prin- 
ciple, and the idea is the following. We should first impose 
appropriate initial conditions so as to make the model equiv- 
alent to a model with F times more particles, without de- 
noising it. Such partially noisy starts can be generated by 
setting up a fraction l/F of the particles with noisy starts. 
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Display De-Noised Data 
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Include Wavelet Filters 



Fast Wavelet Transform 



Enter Scale Parameter 



Threshold Wavelet Coefficients 



Inverse Fast Wavelet Transform 



Enter Thresholding Options 



Enter Contraction Parameter 
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Figure 14. Flowchart of the code. 



O 'bior4.4', 'rbio4.4' (M=6) 
O 'bior 6.8', 'rbio 6.8' (M= 10) 
ir Optimize, in galaxy simulations 



* Optimize, in galaxy simulations 



• C sd = 'g'' for simulations and 
Poissonian data 

• = ng', for Gaussian data 



• Ct = 'h', for standard data 

• Ct = '1', for simulations 



• < C< 1 , for partial de-noising 
at a pre-assigned level 

• C= 0, for total de-noising 



and the rest with quiet starts (quiet starts are common 
in plasma simulations; see, e.g., Dawson 1983; Birdsall & 
Langdon 1991). Doing so, the initial particle distribution 
is basically sub-Poissonian and its signal-to-noise ratio is 
higher than in the Poissonian case by a factor of y/T. On 
the other hand, the noise has a natural tendency to become 
fully Poissonian in few dynamical times, as a reaction of the 
system to the imposed order and hence reduced entropy. We 
should then de-noise the simulation consistently. At the ini- 
tial time, the threshold T is lower than the usual one by 
a factor of l/\/T and the contraction parameter C is unity 



(no de- noising). When the noise level increases, T should be 
increased accordingly and C should be decreased in inverse 
proportion (in the case of quiet starts, C = 0; see also Pa- 
per I). Such an adaptive de-noising is not yet implemented 
in the code. 



5.3 Partial de-noising up to a given scale 

Cosmological simulations are peculiar with respect to galaxy 
and plasma simulations. The initial conditions consist of set- 
ting up a quiet uniform particle distribution, and of impos- 
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ing small random fluctuations with Gaussian statistics and 
a given power spectrum (e.g., Efstathiou et al. 1985; Sy- 
los Labini et al. 2003). Such fluctuations are amplified by 
gravitational instability and form stuctures. On the other 
hand, Poissonian noise develops on the same time-scale and 
therefore affects structure formation. The onset of Poisso- 
nian noise is especially quick in cold dark matter simula- 
tions, where structures form bottom-up and the first virial- 
ized systems contain a small number of particles (e.g., Bin- 
ney & Knebe 2002; Diemand et al. 2004). 

Thus de-noising cosmological simulations is a very com- 
plex and demanding problem: the method should remove 
noise without affecting the physical random fluctuations. An 
ad hoc solution for cold dark matter simulations may be to 
de-noise them only over a range of scales that is adapted to 
the phase of clustering: from the cell size to the size of the 
structures that have formed latest (see Paper I). Partial de- 
noising up to a given scale is implemented in the code, but 
it is not tested in this context; the scale parameter is Ntmin 
(see Sect. 4.3). An upper scale of 2" cell sizes corresponds 
to A^tmin ~ Ad/2". At the beginning of the simulation, be- 
fore the first virialized systems have formed, the upper scale 
should match the smallest physical scale unaffected by the 
A-body method, so Atmin should be set to Ad/2 or Ad/4. 
Analogous solutions may be found in the context of other 
cosmological models. 



6 HOW TO USE THE CODE 

A fiowchart of the code is illustrated in Fig. 14 (the symbols 
are standard; see, e.g., Nyhoff & Leestma 1997). It summa- 
rizes the most useful information given in Sects 4 and 5, 
without repeating the relevant definitions. In this section, 
we discuss such practical points in detail. Supplementary 
information is given in the readme-flle of the code distribu- 
tion. 

The code can be used for de-noising A-body simula- 
tions, and 1D-3D standard data with Poissonian noise or 
additive white Gaussian noise. It contains include-files for 
many orthogonal and bi-orthogonal wavelet filters, and also 
routines for the fast wavelet transform and its inverse. The 
number of data points should contain powers of two. Data 
of different size can be padded: add zeros, or extend the 
data so that their boundary values match smoothly. Smooth 
padding is better because it reduces wrap-around effects. 
Data with multiplicative and/or coloured Gaussian noise can 
also be de-noised. In the first case, pre-process the data by 
taking their logarithm, de-noise them in the usual way and 
post-process. In the second case, compute the standard de- 
viation of noise on each scale from the corresponding detail, 
and threshold the wavelet coefficients accordingly. 

We now explain how to include our add-on code in 
particle-mesh A-body codes (e.g.. Combes et al. 1990; 
Pfenniger & Friedh 1993; Klypin & Hohzman 1997). The 
proper de-noising subroutine should be called just after the 
mass/charge assignment. Note that the right argument is 
the number of particles per cell in the active grid, not the 
mass/charge distribution in the whole mesh. Therefore the 
subroutine needs a simple interface for the conversion of 
such arrays. The specific form of interface depends on For- 
tran details. If there are various particle species, which rep- 



resent components with different collision properties, then 
each species can have its own type of de-noising. The case 
of polar grids is similar to the Cartesian case. In fact, for 
our purpose we can regard the space spanned by the cell 
indices as Cartesian and the particle distribution defined 
there as evenly sampled. In addition, the boundary values 
of the particle distribution match smoothly, except near the 
intersections between the radial boundaries and the equato- 
rial plane. So smooth padding may be justified even if the 
number of radial cells already contains a power of two. In 
order to reduce wrap-around effects significantly, the thick- 
ness of the padding layer should be comparable to the size of 
the wavelet filters. Note that such extra cells are only used 
for de-noising purposes and do not enter into the A-body 
computation itself. The case of other grid geometries is anal- 
ogous. It is not yet clear how to include our add-on code in 
other types of A-body codes. 

Let us finally remark that the physical performance of 
the code depends on how it is used. In order to get a very 
good performance, follow the guide-lines of Sects 4 and 5 
and the practical advice of this section. The performance 
can be optimized in the case of galaxy simulations, since the 
initial model is noisy and the theoretical particle distribu- 
tion is known. The degrees of freedom are the choice of the 
wavelet and the value of the scale parameter (cf. Sects 4.2 
and 4.3). The optimization consists of a simple trial: vary 
such degrees of freedom so as to get the largest de-noising 
factor, and check the visual quality of the de-noised model. 
In the case of cosmological simulations of structure forma- 
tion in the early Universe, the value of the scale parameter is 
a critical issue (cf. Sect. 5.3), while an appropriate choice of 
the wavelet may be guessed considering the characteristics 
of such structures. For example, in cold dark matter sim- 
ulations we would choose the 'bior4.4' wavelet (cf. Fig. 1) 
since the haloes that form are cuspy. In the cases of plasma 
simulations and standard data, we cannot give more speciflc 
instructions than those of Sects 4 and 5. 



7 CONCLUSIONS 

A-body simulations of structure formation in the early Uni- 
verse, of galaxies and plasmas are limited crucially by noise, 
whose effects are subtle and not yet fully understood. In Pa- 
per I, we have introduced an innovative multi-scale method 
of noise reduction based on wavelets, which promises marked 
advances in those contexts. In this paper, we have discussed 
such a method and its code implementation. We have also 
explained how to include our code in the simulator's A-body 
code, and how to use it for de-noising standard data. The 
code is available on request from the first author. The major 
conclusions of this paper are pointed out below. 

(i) This is the first wavelet add-on code designed for de- 
noising A-body simulations, and as such is meant to be a 
building block for more elaborate de-noising codes. We hope 
to have stimulated curiosity about the uses of our code, and 
we challenge simulators to apply it to physical problems 
where noise must be suppressed or controlled. 

(ii) The strength of the code is twofold. It improves the 
performance of simulations up to two orders of magnitude 
(cf. Paper I). Besides, it allows controlling the effects of 
noise: the A-body simulation can be made equivalent to a 
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simulation with a pre-assigned number of particles A'^prc, for 
Npie/N larger than unity and spanning one order of magni- 
tude. 

(iii) The weakness or rather small imperfection of the 
code is that noise-generated instabilities are not reproduced 
very well, in contrast to the induced dynamical evolution. 
Obviously, errors may follow from an incorrect use of the 
code. 

(iv) Finally, we believe that the performance of simu- 
lations can be further improved with a more appropriate 
pre-processing of the data. Fryzlewicz & Nason (2004) have 
shown that the Haar-Fisz transformation is better than the 
Anscombe transformation for pre-processing data with Pois- 
sonian noise, and that the computational time is comparable 
(see also Fryzlewicz 2003) . Work is in progress to investigate 
other relevant properties and uses of this transformation, be- 
fore including it in our code. 
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